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Abstract 

This paper is motivated by a computer experiment conducted for optimizing resid- 
ual stresses in the machining of metals. Although kriging is widely used in the analysis 
of computer experiments, it cannot be easily applied to model the residual stresses 
because they are obtained as a profile. The high dimensionality caused by this func- 
tional response introduces severe computational challenges in kriging. It is well known 
that if the functional data are observed on a regular grid, the computations can be 
simplified using an application of Kronecker products. However, the case of irregular 
grid is quite complex. In this paper, we develop a Gibbs sampling-based expectation 
maximization algorithm, which converts the irregularly spaced data into a regular grid 
so that the Kronecker product-based approach can be employed for efficiently fitting a 
kriging model to the functional data. 

KEY WORDS: EM Algorithm; Gaussian Process Model; Gibbs sampling; Kriging; Latin 
Hypercube Design; Optimization. 
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1. INTRODUCTION 



Computer experiments (Santner et al., 2003; Fang et al., 2006) refer to those experiments 
that are performed using computers with the help of physical models and numerical methods, 
such as finite element analysis. Many computer experiment responses are collected in a 
functional form. That is, for each setting of the experiment, responses are collected over an 
interval of some index, such as space or time. The study of functional data is important 
because it can help us understand how the factors affect the shape of the resulting curve, 
which can have a bearing on the performance of the object or system under investigation. 

This paper is motivated by a computer experiment in the machining of metals using a 
hard turning process. The main objective of the experiment is to study the residual stresses 
generated in the machined surface because they are known to influence the fatigue life and 
are also associated with distortion in machined parts. Nine machining process variables are 
considered in this simulation experiment and for each setting, the residual stress profiles over 
the depth at three different locations in the machined part are generated. Specifically, each 
residual stress profile measured in the tool feed rate direction contains 376 residual stress 
values output by the simulation code over a depth of 376 microns measured from the part 
surface (i.e. one residual stress value per micron). Figure [l] illustrates five sample residual 
stress profiles obtained under different machining process settings. We can see that the 
residual stress profiles vary with the settings. Therefore, it is important to predict the effect 
of the process variables on the residual stress profile so that the machining process can be 
optimized to enhance the fatigue life of machined components. 

The literature on modeling computer experiments with functional responses is scant as 
most of the modeling techniques focus on single or multiple outputs (Conti et al. 2009, 
Conti and O'Hagan 2010). Kriging (Sacks et al. 1989) is the most popular technique for 
modeling these data due to its interpolating property, which is desirable in deterministic 
computer experiments. However, kriging is not used for analyzing functional data because 
of the computational problems caused by the high dimensionality of functional responses, 
especially when they are collected with an intensive sampling rate. Techniques such as 
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Figure 1: Five randomly sampled residual stress profiles vary with the settings. 

wavelet decomposition (Bayarri et al. 2007), principal component analysis (PC A) (Ramsay 
and Silverman 2005, Higdon et al. 2007), functional linear models (Fang et al. 2006), 
and knot-based Gaussian process models (Banerjee et al. 2008) have been used instead. 
Although they provide simple and fast solutions, these models cannot interpolate the data. 
This creates a mismatch in the analysis methods for single and functional outputs. Thus, 
the main objective of this work is to improve the computational efficiency of kriging in the 
analysis of functional data so that the same method can be applied irrespective of the type 
of the response. 

A naive extension of kriging to functional response is to include the functional argument 
as an additional input to the model (Kennedy and O'Hagan 2001, Liu and West 2009). 
For example, the machining experiments are conducted with nine process variables and a 
spatial variable that indicates the locations of the residual stress measurements. The kriging 
model can be fitted by incorporating depth as the 11th variable. Although this method is 
simple, it suffers from the computational difficulties. This is because the maximum likelihood 
estimation of the correlation parameters in the kriging model involves the inversion and 
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determinant calculation of a correlation matrix, whose dimension increases with the total 
number of observations N = nm, where n is the run size of the experiment, and m is 
the number of observations (from the functional space) in each run. These large matrix 
operations make the estimation of the correlation parameters computationally intensive and 
numerically unstable (Joseph and Kang 2011). Take the 90-run machining experiment as 
an example; the inversion and determinant calculation of a 33840 x 33840 matrix (n = 90, 
m = 376, and N = 33840) is required at each iteration of the optimization algorithm used 
in the maximum likelihood estimation, which will make the optimization extremely time- 
consuming. 

One approach to overcome the computational issues associated with the naive kriging ex- 
tension is to apply a Kronecker product formulation for constructing the correlation matrices 
(Williams et al. 2006, Rougier 2008, Liu et al. 2008, Bayarri et al. 2009). However, this 
approach can only be used when the functional responses are collected over a regular grid, 
which means that the outputs are observed at the same locations in the functional space for 
all runs. 

Although not as common as the case of regular grid, observations on nonregular grid also 
occur sometimes in practice. For examples, in the study of transient rolling adhesion and 
deformation of leukocytes, the displacement profile of a cell is simulated using computational 
fluid dynamic models. These profiles are often truncated irregularly at various time points 
due to computational constraints at different kinetic parameter settings (Pappu and Bagchi 
2008). In the thermo- mechanical study of the friction drilling processes, the thrust force 
profiles are generated from finite element modeling over the distance the tool travels. The 
profiles are irregularly collected because the travel distances are different at different values 
of tool feed rate (Miller and Shih 2007). In the motor engine simulations reported in Liu et al. 
(2008), the acceleration profiles are truncated at different time points leading to functional 
data on an irregular grid. Such truncated profiles are also commonly seen in degradation 
studies and fatigue testing simulations because the profile data cannot be collected when the 
product fails. 

In this work, we propose a general and efficient method to overcome the computational 
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issue in analyzing functional response, especially with irregular grid. The paper is organized 
as follows. In Section 2, a brief review of the kriging model is given. The new modeling 
procedure for functional response in computer experiments is developed in Section 3. The 
proposed method is illustrated using the machining experiment in Section 4. Summary and 
concluding remarks are given in Section 5. 

2. Kriging Preliminaries 

Suppose that the computer experiment is conducted using p variables x — (x±, • • • ,x p )' 
and the functional responses are collected over an index t. For each experimental setting 
Xi G TZ P , i = 1, • • • ,n, the outputs y i = (yn, ■ ■ ■ ,yi mi )' are assumed to be a vector which 
is observed over tn, ■ ■ ■ ,t imi with t, = (tn, ■ ■ ■ ,t imi )'. Note that the functional responses 
for each run can be collected differently in terms of location, therefore, tj's and m^'s are not 
necessarily the same for every i. Then, the kriging model is given by 

y(x,t)=v(x,t)'n + Z(x,t), (1) 

where y(x, t) is the response at point t and input setting x, v(x, t)' = (v (x,t),v 1 (x, £),••• , v L (x, t)) 
is a set of known functions (usually, v (x,t) = 1), and fi is a vector of unknown parame- 
ters. We assume that Z(x, t) is a Gaussian process with mean and the covariance function 
cov{y(xi, ti), y{xii £2)} — <y 2 r(x\ — x 2 ,ti — t-z). Furthermore, it is common to assume a sep- 
arable product correlation function: r{x\ — x 2 ,t 1 — t 2 ) = (nf =1 rj(xji — Xi 2 ))r T (t 1 — t 2 ), where 
Ti{xi\ — x i2 ) is the correlation function for the ith variable and r T {ti — t 2 ) is the correlation 
function for variable t. Unknown parameters associated with these correlation functions are 
denoted by Note that the spatial-temporal model (Cressie 1993, Fang et al. 2006) is a 
special case of this model where x represents space and t represents time. 

Suppose the N x 1 vector y = (y[, • • • , y' n )' is a collection of all the functional outputs, 
where N = Yll m i- The experimental design is denoted by the N x p matrix X = (1^ <g> 
£Ci, • • • , 1' <8>x n )' = (X 1; • • • , Xjv)', where (8) denotes the Kronecker product operator, l m . is 
a column of l's having length m i: and T = (t' 1? • • • , t^)' = (i*, , • • • , t* N )' is the corresponding 
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N x 1 vector on the functional space. Based on model (IT]), the universal kriging predictor is 
given by 

y(x, t) = v(x, t)'fi + r(x, t)'R^ t (y - V£), (2) 

where V = (v(X a , tj), • • • ,v(X N ,t* N ))' , A = (V'R^V^V'Rx^, r(aj,t) = (rfc-Xx,*- 
£*),••• ,r(x — Xtv,^ — t* N ))', and Rx,t is the N x N correlation matrix with elements 
r(Xj — Xj,t| — if:*). Note that, the kriging predictor in (|2| interpolates the data because 

= yij. The correlation parameters are estimated 
by minimizing the negative log-likelihood (Santner et al. 2003, pp. 66) 
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Moga 2 + log|R x ,t| 



(3) 



where a 2 = jr(y — V/i) / R x 1 t (?/ — V/i). Other approaches, such as cross-validation and 
restricted maximum likelihood (Santner et al. 2003, Section 3.3) can also be applied. 

As mentioned before, an optimization algorithm employed in ^ may require hundreds of 
evaluations of R x \ and |Rx,t|, which makes the estimation of correlation parameters com- 
putationally prohibitive. To tackle the computational difficulty associated with functional 
response and successfully extend kriging, we propose a new procedure in the next section. 



3. Analysis of Functional Response 
3.1 The first stage 

We propose to build the kriging model in two stages. In the first stage, we use the 
marginal profiles in x and t to identify the functional form of the mean v(x,t) and obtain 
the initial estimates of the correlation parameters. It can be described as follows. Consider 
the model 

y(x, t) = (JL + k\t)u t + g'(x)u x + Z(x, t), (4) 

where k(t) = (ki(t),--- ,k a (t))' and g(x) = (gi(x),--- ,gb(x))'. This is a special case of 
(JT]) because the mean does not entertain the possible interactions between x and t. These 
interactions are now absorbed into Z(x,t). 
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First, we compute the average profile. Assume that the functional outputs are collected 
from m different locations of the functional space. If the functional data are collected over 
an irregular grid, m is defined to be the union of the abscissae of the individual profiles. 
We assume that there are rij observations for location j, j = 1, • • • , m. For each j, define 
e.j = n^Y^iLiiVij — Vi)-, where yi. = rn^YH^iVu- Note that we removed the effect of 
re-variables by subtracting y L from y^. Based on e.i, • • • , e. m , we fit the model 

e{t)=fi t0 + k'{t)u t + Z{t). (5) 

Next, we compute the average responses over the functional space, i.e., £/i., • • • ,y n ., and fit 
the model 

y(x) = jj, x0 +g'(x)v x + Z(x). (6) 

The unknown functions in k(t) and g(x) can be identified using a variable selection proce- 
dure, such as the Bayesian forward selection procedure implemented in blind kriging (Joseph 
et al. 2008). For the correlation functions, we consider the widely used power exponential cor- 
relation functions r^x^— x i2 ) — ex p{~ a i\ x n~ x i2\ d } force and r T (t 1 —t 2 ) = exp{— /3\ti — t 2 \ d } 
for t, where d — 1 and d = 2 correspond to the exponential and Gaussian correlation func- 
tions, respectively. The estimates of the correlation parameters are denoted by = 
(afV- - ,4 0) )' and p<® , which are used as initial values in the second stage estimation. 

The second stage, described in Sections 3.2 and 3.3, is implemented depending on how 
the functional outputs are collected. 

3.2 The second stage: regular grid 

Suppose the functional output is collected over a regular grid, i.e., t\ — • • • — t n — t 
and mi = ■ ■ ■ = m n = m. Thus, in this situation, the functional outputs are observed at 
the same locations of the functional space for each experimental setting. It follows that 
Rx t = Rx®Rt, where Rx is a n-by-n correlation matrix corresponding to x with elements 
r(Xi — Xj) and R t is a m-hj-m correlation matrix according to the t components (Williams 
et al. 2006, Rougier 2008, Liu et al. 2008, Bayarri et al. 2009). We have R x | t = R x x ® 
RJT 1 . Consequently, the computational complexity involved in finding the inverse of Rx t 
is dramatically reduced from 0(n 3 m 3 ) to 0(n 3 + m 3 ) (An and Owen 2001). The resulting 
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kriging predictor can be written as 



y(x, t) = v{x, t)'fi + r(x, t)' (r^ 1 ® R t ^ (y - V£), 



(7) 



where v(x,t) = (l,k'(t),g'(x))' and V = (u(Xi, t{), ■ ■ • , v(X N , t* N ))'. The maximum like- 
lihood estimate (MLE) of the parameters are given by 



t 1 



V ( R-x <8> R^ 1 ) y 



&2 = ^(y- V A)' (Rx 1 ® R^ 1 ) (y - V£), 
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(8) 

(9) 
(10) 



Moga 2 + mlog|Rx| + nlog|R t | 

The computational complexity can be further reduced using an idea in Finley et al. 
(2009), provided the grids are equally spaced. This is especially useful when the functional 
responses are collected with intensive sampling rate. Without loss of generality, assume that 
the equal spacing functional outputs are observed at locations t = (1,2, •• • , m), where m 
can be extremely large. In this case, the evaluation of the kriging predictor can be simplified 
by using an exponential correlation function (d — 1). The use of exponential correlation 
function results in a non-differentiable predictor, however, lower degrees of smoothness can 
still be maintained for large m. The correlation matrix corresponding to t can be written as 



Rt 



1 

P 



P 
1 



P* 
P 



„m— 1 



pin p m ~ 1 p m ~^ 



where p = exp(— f$). It follows that |R t | = (1 — p 2 ) m and 



Rr 1 ^ 



i - p 1 



1 - P o ••• o 

-p l+p 2 -p ■■■ 








-p l + p 2 -p 

-p 1 
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Because is highly sparse and |Rt| can be written in a simple closed form, the evaluation 
of kriging predictor ([7| becomes much easier with a computational complexity of 0(n 3 +mn 2 ). 
3.3 The second stage: irregular grid 

Consider a more general situation where the functional responses are collected over an 
irregular grid. That is, for the iih experimental setting, functional response y i is observed at 
tj = (til-, ■ ■ ■ i timi)' i where tj's and the lengths m/s are not necessarily the same for every i, 
i = 1, • • • , n. In this situation, the Kronecker product technique, which is the key to simplify 
the computation, cannot be directly utilized. We propose an intuitive idea to overcome the 
computational problem, which is to fill-in the "missing observations" so that the data appear 
to be on a regular grid and apply the procedure developed in the previous subsection. Even 
though the issue of missing data has been extensively studied in the literature, the idea of 
creating such missing observations to overcome the computational difficulty in kriging is new. 
Moreover, a direct application of the Expectation Maximization (EM) algorithm (Dempster 
et al. 1977) does not help in reducing the computational complexity. Here we introduce a 
carefully constructed iterative algorithm within the framework of EM algorithm to efficiently 
estimate the missing observations. 

Define the missing data by z = (z\, • • • , z n )', where Zj is a vector of missing observations 
in the ith run. Combining z and the observed data y = (y^, ■ ■ ■ , y' n )', we obtain the complete 
dataset c = (c[, • • • , c' n )' with Cj = (y[ U z^)', which can be viewed as the data collected on 
a regular grid t = (t 1; • • • t m )' with missing data z, t on t\tj. The E-step in the EM algorithm 
is to obtain 

Q(0\b ik) ) = E(lc(0)\y,d {k) ), (12) 

where 

l c {6) = ^log(2n) + ^log(a 2 ) + hog\R x ,\ + ^(c - V»)>R^ t (c - Vp) (13) 

is the negative log-likelihood of the complete data and 6 the param- 

eters estimated at the fcth EM iteration. Clearly, the E-step in (Jl2|) requires the computation 
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of 



E((c - V^)'R x ; t (c - V»|y, 6 ( ) = (E(c\y, 6 ( ') - V^'n^ {E(c\y, ') - Vfi) 



contain the inverse of the correlation matrix constructed using the observations on an ir- 
regular grid. This is the same computational hindrance we faced before. Therefore, such a 
direct application of EM cannot overcome the computational issue. 

To simplify the computation, we modify the standard EM approach using an idea similar 
to Monte Carlo EM algorithm (Chan and Ledolter 1995). The main idea is to estimate the 
missing data run-by-run by conditioning on the complete data in the other runs. Let z\ be 
the missing data estimated in the jth iteration and c 3 ' = (c{, ■ ■ ■ , eQ, where = {y i U z{) 
are interspersed onto the regular grid t. We can sample the missing data by using a Gibbs 
sampler according to the conditional distributions 




(15) 



f(cn\y,z{, • ■ • 




with z 3 ,* = (z{,-- - , z\_n z 3 i+1 , ■ ■ ■ ,z 3 n ). Because each step in (15) involves data from a 



regular grid, the computations are cheap. This is shown below. 
The prior distribution (Currin et al. 1991) for Cj is 



/(q|6> ( ViV^ScJ, (16) 



where £ c . = (u(aji,*i), • • • 



v(xi,t m )) fj, and S Cl 



o" 2 Rt. It follows that 



/( 



(17) 



/(Ci|l/(_i)> Z M)> 6 ) ~ N (t(-i)> S M)) > 



(18) 
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where 



c, 



Cci + ( r t t (ti), • • • , r tl (t OT ))'R t / [Vi - tii), • • • , v(xi, t imi )) V] , 

S Ci - ^K(ti), • • • , Mt^Ht-Wti), • • • , r ti (t m )), 
C Cl + {r^ix^tt),--- ,r(-i)(xi,t m ))' (r^^ ®RtM (c ( _ i} - Cc ( _ 4) ), 
S Cl - o- 2 (r ( _i)(^,ti), • • • ^(-^(aji^m))' 
R x^ i} ® R* 1 ^ (** ( _<j(a5i, ti), ■ ■ ■ ,r(_<)(x 4 ,t m )), 



r t>(^) = (?"t(^ - tii), • • • ,r T (ij- - t imi ))', Cc 



i-i) 



E(c(_i)), Rx ( _ 1) is the (n 



i) x (n - r 



correlation matrix corresponding to a? except the zth setting, yr-{\ is the observed data 
except the ith profile, is the missing data except those from the ith run, and C(_j) = 
U Then, we obtain the following result. 



Proposition 1: Based on (16), (17), and (18), it follows that 



where 



Sc, 1 )- 1 - 



(19) 

(20) 
(21) 



The proofs are given in the Appendix. Based on (19), the conditional mean of Cj is a 
weighted average of three terms: the mean £ c . from prior distribution, the conditional mean 

* (A;) 

based on the ith observed profile <^ = E(ci\y, h ), and the conditional mean based on the 
complete data except the zth profile C(-*) = E( c i\ c {-i)i Q )■ The weights are determined 
by their corresponding variances, Sjv, Sj, and S(_^, which are also components in the 



conditional variance. Evaluating the conditional distribution in (19) is computationally 
efficient because £ c . and involve observations with size m and m 8 which are easy to 
calculate. This is also true for the variances Efv and Sj. Furthermore, because cr-i) is on 
a regular grid, Ct-i) an d ^(-*) can be easily evaluated. By simple modifications, results in 



Proposition 1 can be extended to the conditional distributions in (15). For example, the 



Ms 



posterior distribution f(ci\y,z{,z^_l_ 2 y0 y ') can be obtained by replacing with the 
updated missing data {z{,z^Zi_ 2 )) * n (19). 
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After generating q samples using the Gibbs sampler, we can now approximate Q{6\6 

Q^\0\0 {k) )= l -j^l CJ {6)- 



in (p| by 



Q 



(22) 



3=1 



However, if the Gibbs sampler needs too many samples to attain convergence, the compu- 
tational advantage of the foregoing procedure may diminish. Interestingly, we can further 
simplify the procedure and speed-up the convergence using the following idea. Instead of 
sampling from the conditional distributions, we can simply iterate on the conditional expec- 
tations as follows: 



E{z x \y,z j r\ v d 



E{z n \y,z{,---z ] n ^ l ,6 {l 



(23) 



where E(Zi\y, ) represents the expectation based on the conditional distribution 
f(ci\y, )• These conditional expectations can be easily computed using (20). It is 
worth noting that 



i-i) 



1 



and 



R x 





A(_j) B.i 




a- hi 



(24) 



— ^iRx ( _ 4) S i?, S R 

where sr = 1 — u'-R^ ^g i . Therefore, the evaluation of R^ . in (20) can be efficiently 
updated by 

Because of this simplification, the complexity in estimating the missing observations is suc- 
cessfully reduced from m i) 3 ) to 0(n 3 + m\ + m 3 + mn 2 ) per iteration and it can 



be further reduced to 0(n 3 + mn 2 ) by using the exponential correlation function (11) for 
the functional argument. The asymptotic convergence of z]'s follows from the proposition 
below. 
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Proposition 2: Assume that 



max,- 



dk 



Ik 



< 1, 



(25) 



where (d 



i) • • ' j 



(in) 



H) 



M-i)' 



D 



max\ 



S||2 = 



Ds 



and [ ■ ] 



ik 



indicates the block matrix with rows corresponding to the missing data at the zth run 
and columns corresponding to the missing data at the fcth run. Then for all i, || z\ — 
E(zi\y, ) || 2 converges to as m — > oo and q — > oo . 



The assumption in (25) can be interpreted intuitively. Recall that in Proposition 1 we 
have Si = S Ci -cr 2 (r t .(t 1 ), • • • , r u (t m ))' R^r^ti), • • • ,r t .(t m )). When = 0, the equal- 
ity Sj = Sc, holds; when m.j 7^ 0, |Sc, — Sj| > and it increases with m; for fixed 0. Simi- 
larly, it follows that IS^-E^I = |S£||<7 2 (r tt (*i), • • • , r ti (t m ))' R^K^i), • • • , r ti (t m ))||£ri| 



increases with rrii and consequently the left side of (25) decreases. Hence, this assumption 



suggests that the number of observations from the ith profile (mj) should be reasonably large 
in order to accurately estimate the missing data. 



The simplification made by (23) results in a new approximation to the E-step: 

Q^(0\0 {k) )^l cq (0), 



(26) 



where let is the negative log-likelihood evaluated in ( 13 ) with c replaced by c 



This is equivalent to estimating the missing observations and then pretending they were 
known. 

Now we can implement the M-step in the EM algorithm, which is to update the MLEs 

by 







(fc+i) 



argmim, Q (<?) (0|0 (fc) ) 



where Q^ g \0\0 ) is obtained from (26). Since the estimation herein is based on the complete 
data c q collected on regular grids, estimators according to ([8|, (|9j, and (10) can be computed 
efficiently. 

There are some parameters that need to be pre-specified in the EM procedure. The initial 
estimates of , /3 (0) ) can be obtained from the first stage of our model 
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building procedure. The initial estimates of the missing data can be obtained by adding 
the predictions from the two kriging models ^ and ([6]). Thus, the missing data of the ith 
profile, zf \ at point t, is given by 

y(x h t) = {L m +fL X Q+k(t)'u t +g(xi)'v x +r T (t)'Tlt l { e t- K 

where K = (fc(ti), ■ ■ ■ , k(t m ))', G = (g(xi), ■ ■ ■ ,g(x n ))', and R^ 1 and R^- 1 are evaluated 
at and a*- -*. The EM procedure is terminated when maxj 

|^(fc+i) _^(fc)| < Aj where the 

tolerance A can be chosen based on the required accuracy. 

Since the proposed approximation converges to the standard E-step when q — > oo, the 
convergence of this procedure is guaranteed based on the results in the EM literature. More 
details can be found in Wu (1983), Chan and Ledolter (1995), and Fort and Moulines (2003). 

4. RESIDUAL STRESS OPTIMIZATION 

In this section, we revisit the machining experiment originally performed by Hung et al. 
(2009) and apply the proposed method to analyze the functional outputs. The objective of 
the experiment is to optimize a turning process for hardened bearing steel. Because of the 
computational challenge, Hung et al. (2009) optimized only the cutting forces, which is a 
single output problem, whereas the main objective of the experiment was to optimize the 
hard turning process with respect to residual stress, which is a functional output and is the 
focus of this paper. 

Ten variables are considered in this experiment (Table [T]). A 30-run orthogonal-maximin 
branching Latin hypercube design (BLHD) (Hung et al. 2009) is constructed based on the 
first nine variables and is given in Table [2j The cutting edge shape is labeled "1" and "2" to 
stand for chamfer and hone, respectively. The second and third factor, chamfer angle and 
length, are divided into 15 levels and the setting corresponding to the hone edge (x\ = 2) 
is left blank because these two factors exist only when chamfer edge is considered. This is 
the main difference between BLHD and the traditional Latin hypercube design. The rest of 
the factors are set up at 30 levels. Since the cutting edge shape is a categorical variable, 
an isotropic correlation function (Hung et al. 2009) is used in the analysis. Exponential 
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Table 1: Factors and their ranges in the hard turning experiment 



Factor 


Ranges 


x\. Cutting edge shape 


chamfer or hone 


X2- Chamfer Angle (degree) 


17 ~ 20 


X3: Chamfer Length (jum) 


115 ~ 140 


X4: Cutting edge radius (jum) 


5-25 


X5: Rake angle (degree) 


-15 5 


Xq: Tool nose radius (mm) 


0.4 ~ 1.6 


xj\ Cutting speed (m/min) 


120 ~ 240 


Xs'- Feed rate (mm/rev) 


0.05 ~ 0.15 


Xg\ Depth of cut (mm) 


0.1 ~ 0.25 


x w : Location 


1,2,3 



correlation functions as defined in Section 3.1 are used for the rest of the factors. For 
each experimental setting, residual stresses are collected as a function of depth using a 
highly sophisticated finite element-based machining simulation software AdvantEdge (Third 
Wave Systems Inc., Minneapolis, MN). The simulations are computationally intensive which 
require 12 to 24 hours of running time for each experimental run. 

The residual stress profiles are originally collected on a regular grid, but to illustrate 
the procedure, we artificially truncated them (randomly from 100 to 376 microns) so that 



the profiles are observed over an irregular grid. The assumption (25) holds in general with 



such a truncation and the left hand side of (25) decreases with respect to the percentage 
of truncation. Figure [2] illustrates five such profiles. The average of the 90 residual stress 
profiles is shown in Figure [3j Theoretically, it is expected that the residual stress will tend 
towards as the depth increases. Therefore, a nonlinear model might fit this profile better. 
We fitted S = exp(—Xt)(fi + Hit + [i^t 2 ) for the averaged residual stress profile S using least 
squares and obtained an estimate of A as 0.005. Based on this result, we transformed the 
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Table 2: Orthogonal-maximin BLHD for the hard turning experiment 
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original functional data by multiplying exp(Xt) and used them for the model fitting. 




depth (microns) 



Figure 2: Truncated residual stress profiles. 

By fitting e(t) = /i t0 + k'(t)u t + Z(t) with k(t) = (t,t 2 )', we obtained /3 (0) = 1.24. The 
blind kriging procedure (Joseph et al. 2008) applied on the 30 average values did not suggest 
any particular trend in the x-variables; therefore we fitted an ordinary kriging model y(x) = 
Li x0 + Z(x) and obtained A (0) = (0.10,0.76,0.45,4.50,3.72,1.63,0.17,2.05,4.50,0.59)'. Now 
we are ready to move on to the second stage of the model building procedure. 

In the second stage, the proposed EM algorithm is applied. We applied 10 iterations 



(q = 10) using (23) inside each EM iteration. With A = 0.05, the EM algorithm terminated 
after 21 iterations. Figure [4] shows some of the missing data estimated by the procedure. 
They are reasonably close to the true observations (the mean squared prediction error is 
45.88), showing that the procedure works well. The maximum likelihood estimates of the 
correlation parameters are given by a=(1.34, 1.68, 1.47, 3.35, 2.80, 4.33, 1.82, 3.44, 4.09, 
1.52)' and /3 = 1.12. The computational saving obtained by the proposed EM procedure is 
quite substantial. The EM procedure took about 9.25 hours on a 3.4-GHz PC, whereas the 
naive kriging extension did not even converge after waiting for four days. 
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Figure 3: Average residual stress profile 
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Figure 4: Missing data estimated by EM. 
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The 100(1 — k)% confidence interval for the prediction is given by (Santner et al. 2003, 
pp. 93), 

y(x, t) ± Z K/2 a{l - r(x, t)'R^ t r(x, t) + h'(V'R x \ V^h} 1 ' 2 , (27) 

where Z K j<i is the upper k/2 critial point of the standard normal distribution and h = 
v(x,t) — V'H^ t r(x, t). The 95% confidence intervals are illustrated using six randomly 
selected profiles (Figure [5]). Each predicted curve was evaluated by leaving the original 
profile out from the training set. Note that we used a small nugget term to overcome the 
near-singularity in evaluating the large matrix inverse R-xt (Gramacy and Lee 2011). The 
construction of the confidence intervals took about seven hours in a 2.33 GHz computer. 
This is not a big concern because we need to evaluate the matrix only once for constructing 
the confidence interval. 

To illustrate the advantage of data augmentation, we compared the proposed approach 
with two approaches performed on the common grid (i.e., depths up to 100 microns). The 
two approaches include the widely used PCA (Higdon et al. 2007) and the naive kriging 
approach with functional argument as an additional input. In PCA, four basis functions were 
identified corresponding to the largest eigen values which account for more than 95% of the 
variation in the data. To compare the performance, we revisited the six randomly selected 
profiles and computed the mean squared cross validation error (MSCV) averaged over the 
entire depth. The MSCV for the proposed kriging method is 49.11, whereas the MSCV 
of the PCA method is 69.51 and the naive kriging method without data augmentation is 
60.23. Although the PCA can be further improved by incorporating more basis functions, the 
improvement is marginal (e.g. with 10 basis functions, the MSCV is reduced only to 68.94.) 
The comparisons based on MSCV clearly show that the proposed kriging method provides 
better predictions than the other two methods. This is not surprising because without data 
augmentation, the information from 100 to 376 microns will be completely lost leading to 
severe extrapolation, which can be quite inaccurate. 

The fitted model can be used for optimizing the residual stresses. The exact objective of 
optimization depends on the application area of the machined components. In most cases, 
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Figure 5: Leave-one-out prediction and confidence interval 

the objective is to minimize the maximum tensile (i.e. maximum positive) residual stress. 
Thus, the objective is to find 

x* = min xu ... , 2g \max t , Xw y(x, t) j . (28) 

The maximization over depth (t) and locations (xio) provides the largest residual stress over 
the machined component. The optimal setting is given by £C* = (1.00, 11.38, 20.5, 11.49, 9.2, 
6.14, 13.11, 3.99, 9.12)'. Based on this setting, the predicted maximum residual stress is 58.84 
MPa, which is about 53% smaller than the observed smallest maximum residual stress, 124.26 
MPa, in the experiment. These results are reasonably close to that found using the original 
regular data, in which the optimal setting is x* = (1.00, 11.61, 19.87, 11.44, 10.48, 7.66, 
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12.63, 4.41, 9.35)' and the predicted maximum residual stress is 58.15 MPa. The predicted 
optimal residual stress profiles are illustrated in Figure [6] with the solid line indicating the 
prediction using the irregular data and the dash line indicating the prediction using the full 
regular data. 




Figure 6: Optimal residual stress profiles with the full regular data and the truncated irreg- 
ular data. 

We also performed sensitivity analysis based on the main effect plots and two-factor 
interaction plots discussed in (Welch et al. 1992). The three main effects of feed, cutting 
edge shape, and the cutting speed, appear to be most important. Their effects on the 
residual stress profile are shown in Figure [7| For better clarity, the residual stress profiles 
were plotted with depths smaller than 100 microns (as the depth increases, the residual 
stresses converge to 0). We can see that the surface residual stress (i.e., depth=0) increases 
with feed. This trend is consistent with actual observations in the process. In Figure ufb), 
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the hone edge tool leads to larger surface residual stress than with the chamfer edge tool. In 
Figure [7^c), the general effect of increasing the cutting speed is to cause the residual stress 
profile to become more tensile (or positive). This can be physically explained on the basis 
of the process physics and is attributed to increased thermal effects with increase in cutting 
speed. 



feed cutting edge shape cutting speed 




Figure 7: Significant main effects on the residual stress profile. 

5. SUMMARY AND CONCLUDING REMARKS 

Computer experiments with functional response are encountered in many engineering 
and scientific studies. Most available methods in the literature for analyzing computer ex- 
periments, however, focus on single outputs. Extending these methods to functional data 
analysis pose severe computational challenges. This article proposed a systematic method- 
ology for modeling functional outputs using kriging especially for the data observed on an 
irregular grid. The computational challenge is successfully tackled by the proposed two-stage 
model building procedure which incorporates the Kronecker product technique and a version 
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of the EM algorithm for estimation. A direct application of the standard EM algorithm was 
not possible. Therefore, we introduced a Gibbs sampling-based computationally efficient al- 
gorithm for estimating the missing data run-by-run. The proposed method is illustrated by 
a hard turning machining simulation experiment in which the functional response, residual 
stress, is collected over depth. Based on the fitted model, the effects of each factor on the 
residual stress profile are studied and optimal settings are identified. 

In this research, the uncertainty quantification is studied based on the plug-in approach, 
which underestimates the true variance. A possible extension is to develop a fully Bayesian 
approach by incorporating prior information from the mean coefficients and correlation pa- 
rameters so that the uncertainty about the unknown parameters can be taken into account. 
However, a direct application of the fully Bayesian approach is computationally infeasible 
because each MCMC sample involves a large matrix inversion. Faster Bayesian computing 
techniques should be considered to tackle this problem (Joseph 2012). 

A weakness of the current framework is the stationarity assumption employed in the krig- 
ing model. One approach to relax this assumption is to incorporate treed partitioning based 
on the idea of Chipman et al. (2002) and Gramacy and Lee (2008). Although additional 
computational efforts are required due to the recursive partitioning of trees, extension of this 
procedure is computationally tractable because the proposed algorithm can be implemented 
within each partition (leaf of the tree), which has less data. 
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The objective is to calculate conditional posterior f(ci\y, = f{ci\y i7 Ut-i), z (-i))- 

We have 

K /(l/<»l/(-i)»*(-i)|Ci)/(Ci)» 

and 

/(l/i,l/(-i),*(-i)|Ci) = /(2/(-i). z (-i)l?/i> c i)/(?/il c *)- 

Therefore, 



/(ci|l/i>l/(-0>«(-o) oc /(y H) , ^/(j/Jc;)/^) 

oc f{y ( -i), Zi-ijlvt, c i )f{c i )f{y i \c i )f{c i )/f{c i ). 



(29) 



Because y { C Cj, we have f{yt i \,z^i)\y i ,Ci) = f{yr i \,z^_ i )\ci) and equation (29) can be 
written as 



/(ci|2/i,2/(-i),2(-i)) oc /(!/(_<)*(_<) Icj)/^)/^!^)/^)/ /(Cf) 

oc f{<k\y(-i),z^)f{ci\y^/f{ci). 



(30) 



Since the three prior distributions on the right hand side of (30) are all normally distributed 



as given in (16), (17), and (18), we have 



/(ci|z/i>Z/(-i)> z (-;)) oc -|exp j(c; - C^'E^c* - £(_,-)) + (q - Q)'^ 1 (c i - C, 
oc -|exp |(ci - ryJTr 1 ^ - r^j, 



where r/j and Tj are given in (20) and (21). Therefore, the conditional distribution of 
/(Ci|2/j, y (_,•), follows. 



Appendix B: Proof of Proposition 2 
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First, it holds that 



By taking a factor out of (20 ) and assuming that (di, 
we have 



; j ' ' ' j 



■mxm ~r Zj(_j)Zj^ 



{Imxrn + S(_j)Sj 1 — E^j)!]^ ) 1 



zt 1 + a 



r' H) K)R 



(31) 



ik 



where C is independent of Zj's. For notational simplicity, we will denote [ • ] ., by [• ] in the 
remaining part of the proof. 



Consider a set of z\ that satisfy (|23|) simultaneously, i.e., z\ — >• we have 



{Imxm + ^(-i)^i 1 — ^(-*)^Ci) 1 



^ A: 



{Imxm + S(_i)S,- 1 — S(_ i )S Ci 1 ) 



2fc + C. 



(32) 



Subtracting (32) from (31) and assuming that e\ = z\ — Zj, we obtain 



+ ^2k=i+l dk 



[Imxm + S(_i)S^ 1 — X^-j)!]^ 1 ) 



Define 5 = max; J2k=i,k& ^ 
on % that the following result 



[-^mxm + S(_i)S i — S(_j)S Cj 1 ] 

holds: 



n - 1 



max, || e\ ||< 8 maxj || ||, j = 1,2 



. Now we show by induction 



(33) 



For % = 1, 



k J 



fe=i+l 
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and thus, (33) clearly holds. Assume that (33) holds for % = !,-•• ,1 — 1 and based on 



assumption (25), we have for i = I 



e\ || < 6 max/ || e\ 1 || J2k=i ^k 



w-i 



-n- 1 



[(Imxm + S(_/)E / 1 — S(_/)S C; 1 ] 



+ Y2=l+1 d k 



< 5 max, || || }2 k . 



11 -1 



< 5 max; || 



j-i 



The result in (33) implies that maxj || e\ ||< <P max, || e° || and max.j || ||— > as j — > oo. 
Therefore, z^'s converge to the values that satisfy equations (23) simultaneously. According 



to the result in Yee et al. (2002), z\ converge to the posterior mean E(zi\y,0 ) when 
m — > oo. 
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